Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 423 | 432 |
| Missing cells (%) | 7.9% | 8.1% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 81 (18.2%) missing values | Age has 86 (19.3%) missing values | Missing |
Cabin has 340 (76.2%) missing values | Cabin has 345 (77.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 299 (67.0%) zeros | SibSp has 305 (68.4%) zeros | Zeros |
Parch has 345 (77.4%) zeros | Parch has 345 (77.4%) zeros | Zeros |
Fare has 8 (1.8%) zeros | Fare has 10 (2.2%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-18 20:27:21.796862 | 2025-03-18 20:27:23.908053 |
| Analysis finished | 2025-03-18 20:27:23.905312 | 2025-03-18 20:27:25.990366 |
| Duration | 2.11 seconds | 2.08 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 444.0426 | 446.19283 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 3 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 3 |
| 5-th percentile | 48.5 | 59.5 |
| Q1 | 231.5 | 235.25 |
| median | 428.5 | 446.5 |
| Q3 | 653.75 | 649.75 |
| 95-th percentile | 853.75 | 845.5 |
| Maximum | 891 | 891 |
| Range | 889 | 888 |
| Interquartile range (IQR) | 422.25 | 414.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 253.55235 | 247.19408 |
| Coefficient of variation (CV) | 0.57100907 | 0.55400731 |
| Kurtosis | -1.1706937 | -1.0945158 |
| Mean | 444.0426 | 446.19283 |
| Median Absolute Deviation (MAD) | 210 | 209 |
| Skewness | 0.036204644 | 0.034319144 |
| Sum | 198043 | 199002 |
| Variance | 64288.796 | 61104.916 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 350 | 1 | 0.2% |
| 357 | 1 | 0.2% |
| 755 | 1 | 0.2% |
| 159 | 1 | 0.2% |
| 406 | 1 | 0.2% |
| 118 | 1 | 0.2% |
| 545 | 1 | 0.2% |
| 764 | 1 | 0.2% |
| 533 | 1 | 0.2% |
| 247 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 90 | 1 | 0.2% |
| 572 | 1 | 0.2% |
| 103 | 1 | 0.2% |
| 707 | 1 | 0.2% |
| 205 | 1 | 0.2% |
| 134 | 1 | 0.2% |
| 198 | 1 | 0.2% |
| 104 | 1 | 0.2% |
| 676 | 1 | 0.2% |
| 653 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 21 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 18 | 1 | |
| 20 | 1 | |
| 23 | 1 | |
| 24 | 1 | |
| 29 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 18 | 1 | |
| 20 | 1 | |
| 23 | 1 | |
| 24 | 1 | |
| 29 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 21 | 1 | |
| 22 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 1 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 1 |
| 4th row | 0 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 288 | |
| 1 | 158 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 1 |
| 2nd row | 2 | 1 |
| 3rd row | 3 | 2 |
| 4th row | 2 | 3 |
| 5th row | 2 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 231 | |
| 1 | 112 | |
| 2 | 103 |
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 111 | |
| 2 | 98 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 67 |
| Median length | 52 | 50 |
| Mean length | 27.558296 | 26.878924 |
| Min length | 13 | 13 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Bowerman, Miss. Elsie Edith | Appleton, Mrs. Edward Dale (Charlotte Lamson) |
| 2nd row | Herman, Mrs. Samuel (Jane Laver) | White, Mr. Richard Frasar |
| 3rd row | Smiljanic, Mr. Mile | Kelly, Mrs. Florence "Fannie" |
| 4th row | Gale, Mr. Shadrach | Cohen, Mr. Gurshon "Gus" |
| 5th row | Turpin, Mr. William John Robert | Weisz, Mrs. Leopold (Mathilde Francoise Pede) |
| Value | Count | Frequency (%) |
| mr | 256 | 13.9% |
| miss | 92 | 5.0% |
| mrs | 68 | 3.7% |
| william | 45 | 2.4% |
| john | 21 | 1.1% |
| master | 20 | 1.1% |
| henry | 19 | 1.0% |
| mary | 13 | 0.7% |
| thomas | 12 | 0.6% |
| george | 11 | 0.6% |
| Other values (898) | 1291 |
| Value | Count | Frequency (%) |
| mr | 263 | 14.5% |
| miss | 93 | 5.1% |
| mrs | 66 | 3.7% |
| william | 25 | 1.4% |
| john | 23 | 1.3% |
| henry | 21 | 1.2% |
| master | 15 | 0.8% |
| richard | 13 | 0.7% |
| george | 13 | 0.7% |
| james | 13 | 0.7% |
| Other values (907) | 1263 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1403 | 11.4% | |
| r | 1003 | 8.2% |
| e | 867 | 7.1% |
| a | 863 | 7.0% |
| i | 703 | 5.7% |
| s | 658 | 5.4% |
| n | 647 | 5.3% |
| l | 589 | 4.8% |
| M | 560 | 4.6% |
| o | 484 | 3.9% |
| Other values (49) | 4514 |
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 993 | 8.3% |
| e | 855 | 7.1% |
| a | 825 | 6.9% |
| n | 680 | 5.7% |
| i | 657 | 5.5% |
| s | 654 | 5.5% |
| M | 569 | 4.7% |
| l | 520 | 4.3% |
| o | 492 | 4.1% |
| Other values (49) | 4381 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12291 |
| Value | Count | Frequency (%) |
| (unknown) | 11988 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1403 | 11.4% | |
| r | 1003 | 8.2% |
| e | 867 | 7.1% |
| a | 863 | 7.0% |
| i | 703 | 5.7% |
| s | 658 | 5.4% |
| n | 647 | 5.3% |
| l | 589 | 4.8% |
| M | 560 | 4.6% |
| o | 484 | 3.9% |
| Other values (49) | 4514 |
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 993 | 8.3% |
| e | 855 | 7.1% |
| a | 825 | 6.9% |
| n | 680 | 5.7% |
| i | 657 | 5.5% |
| s | 654 | 5.5% |
| M | 569 | 4.7% |
| l | 520 | 4.3% |
| o | 492 | 4.1% |
| Other values (49) | 4381 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12291 |
| Value | Count | Frequency (%) |
| (unknown) | 11988 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1403 | 11.4% | |
| r | 1003 | 8.2% |
| e | 867 | 7.1% |
| a | 863 | 7.0% |
| i | 703 | 5.7% |
| s | 658 | 5.4% |
| n | 647 | 5.3% |
| l | 589 | 4.8% |
| M | 560 | 4.6% |
| o | 484 | 3.9% |
| Other values (49) | 4514 |
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 993 | 8.3% |
| e | 855 | 7.1% |
| a | 825 | 6.9% |
| n | 680 | 5.7% |
| i | 657 | 5.5% |
| s | 654 | 5.5% |
| M | 569 | 4.7% |
| l | 520 | 4.3% |
| o | 492 | 4.1% |
| Other values (49) | 4381 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12291 |
| Value | Count | Frequency (%) |
| (unknown) | 11988 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1403 | 11.4% | |
| r | 1003 | 8.2% |
| e | 867 | 7.1% |
| a | 863 | 7.0% |
| i | 703 | 5.7% |
| s | 658 | 5.4% |
| n | 647 | 5.3% |
| l | 589 | 4.8% |
| M | 560 | 4.6% |
| o | 484 | 3.9% |
| Other values (49) | 4514 |
| Value | Count | Frequency (%) |
| 1362 | 11.4% | |
| r | 993 | 8.3% |
| e | 855 | 7.1% |
| a | 825 | 6.9% |
| n | 680 | 5.7% |
| i | 657 | 5.5% |
| s | 654 | 5.5% |
| M | 569 | 4.7% |
| l | 520 | 4.3% |
| o | 492 | 4.1% |
| Other values (49) | 4381 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7219731 | 4.7219731 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | female |
| 2nd row | female | male |
| 3rd row | male | female |
| 4th row | male | male |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 74 | 77 |
| Distinct (%) | 20.3% | 21.4% |
| Missing | 81 | 86 |
| Missing (%) | 18.2% | 19.3% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.600685 | 30.413639 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.75 | 0.75 |
| Maximum | 80 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.75 | 0.75 |
| 5-th percentile | 5 | 6 |
| Q1 | 21 | 21 |
| median | 28.5 | 29 |
| Q3 | 38 | 39 |
| 95-th percentile | 56 | 57 |
| Maximum | 80 | 80 |
| Range | 79.25 | 79.25 |
| Interquartile range (IQR) | 17 | 18 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.166069 | 14.326524 |
| Coefficient of variation (CV) | 0.47857233 | 0.4710559 |
| Kurtosis | 0.35899321 | 0.38199735 |
| Mean | 29.600685 | 30.413639 |
| Median Absolute Deviation (MAD) | 8.5 | 9 |
| Skewness | 0.43041228 | 0.44526272 |
| Sum | 10804.25 | 10948.91 |
| Variance | 200.67751 | 205.24929 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 24 | 15 | 3.4% |
| 36 | 13 | 2.9% |
| 29 | 13 | 2.9% |
| 28 | 13 | 2.9% |
| 30 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 32 | 11 | 2.5% |
| 35 | 11 | 2.5% |
| Other values (64) | 237 | |
| (Missing) | 81 | 18.2% |
| Value | Count | Frequency (%) |
| 22 | 15 | 3.4% |
| 18 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 28 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 30 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 32 | 12 | 2.7% |
| 19 | 11 | 2.5% |
| 26 | 11 | 2.5% |
| Other values (67) | 232 | |
| (Missing) | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 4 | |
| 5 | 4 | |
| 6 | 2 | |
| 7 | 2 | |
| 8 | 1 | 0.2% |
| 9 | 3 |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 2 | 0.4% |
| 2 | 3 | |
| 3 | 1 | 0.2% |
| 4 | 6 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 2 | 0.4% |
| 2 | 3 | |
| 3 | 1 | 0.2% |
| 4 | 6 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 4 | |
| 3 | 4 | |
| 4 | 4 | |
| 5 | 4 | |
| 6 | 2 | |
| 7 | 2 | |
| 8 | 1 | 0.2% |
| 9 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.53363229 | 0.50224215 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 299 | 305 |
| Zeros (%) | 67.0% | 68.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1287813 | 1.0380381 |
| Coefficient of variation (CV) | 2.1152792 | 2.066808 |
| Kurtosis | 18.825199 | 19.569565 |
| Mean | 0.53363229 | 0.50224215 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.84297 | 3.7477421 |
| Sum | 238 | 224 |
| Variance | 1.2741472 | 1.0775231 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 112 | 25.1% |
| 2 | 15 | 3.4% |
| 4 | 6 | 1.3% |
| 5 | 5 | 1.1% |
| 3 | 5 | 1.1% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 8 | 3 | 0.7% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 112 | 25.1% |
| 2 | 15 | 3.4% |
| 3 | 5 | 1.1% |
| 4 | 6 | 1.3% |
| 5 | 5 | 1.1% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 5 | 1 | 0.2% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 103 | 23.1% |
| 2 | 18 | 4.0% |
| 3 | 8 | 1.8% |
| 4 | 8 | 1.8% |
| 5 | 1 | 0.2% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 299 | |
| 1 | 112 | 25.1% |
| 2 | 15 | 3.4% |
| 3 | 5 | 1.1% |
| 4 | 6 | 1.3% |
| 5 | 5 | 1.1% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 7 |
| Distinct (%) | 1.3% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.37668161 | 0.367713 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 6 |
| Zeros | 345 | 345 |
| Zeros (%) | 77.4% | 77.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 6 |
| Range | 5 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.83291486 | 0.81515845 |
| Coefficient of variation (CV) | 2.2111906 | 2.2168333 |
| Kurtosis | 9.8442571 | 11.54414 |
| Mean | 0.37668161 | 0.367713 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.8568304 | 2.9823302 |
| Sum | 168 | 164 |
| Variance | 0.69374717 | 0.6644833 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 5 | 4 | 0.9% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 56 | 12.6% |
| 2 | 37 | 8.3% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 56 | 12.6% |
| 2 | 37 | 8.3% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 56 | 12.6% |
| 2 | 37 | 8.3% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 54 | 12.1% |
| 2 | 38 | 8.5% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 4 | 0.9% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 383 | 389 |
| Distinct (%) | 85.9% | 87.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7331839 | 6.7421525 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 335 | 342 ? |
| Unique (%) | 75.1% | 76.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 113505 | 11769 |
| 2nd row | 220845 | 35281 |
| 3rd row | 315037 | 223596 |
| 4th row | 28664 | A/5 3540 |
| 5th row | 11668 | 228414 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.6% |
| c.a | 14 | 2.5% |
| ca | 10 | 1.8% |
| a/5 | 8 | 1.4% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 2144 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| s.o.c | 4 | 0.7% |
| soton/oq | 4 | 0.7% |
| Other values (400) | 471 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| c.a | 12 | 2.1% |
| a/5 | 11 | 2.0% |
| ston/o | 6 | 1.1% |
| 2 | 6 | 1.1% |
| soton/o.q | 6 | 1.1% |
| ca | 6 | 1.1% |
| w./c | 6 | 1.1% |
| 347088 | 4 | 0.7% |
| 347082 | 4 | 0.7% |
| Other values (408) | 472 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 363 | |
| 1 | 341 | |
| 2 | 302 | |
| 7 | 249 | |
| 4 | 228 | 7.6% |
| 6 | 222 | 7.4% |
| 5 | 195 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 167 | 5.6% |
| 8 | 144 | 4.8% |
| Other values (22) | 601 |
| Value | Count | Frequency (%) |
| 3 | 393 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 250 | |
| 4 | 228 | 7.6% |
| 0 | 205 | 6.8% |
| 6 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 162 | 5.4% |
| 8 | 154 | 5.1% |
| Other values (22) | 589 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3003 |
| Value | Count | Frequency (%) |
| (unknown) | 3007 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 363 | |
| 1 | 341 | |
| 2 | 302 | |
| 7 | 249 | |
| 4 | 228 | 7.6% |
| 6 | 222 | 7.4% |
| 5 | 195 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 167 | 5.6% |
| 8 | 144 | 4.8% |
| Other values (22) | 601 |
| Value | Count | Frequency (%) |
| 3 | 393 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 250 | |
| 4 | 228 | 7.6% |
| 0 | 205 | 6.8% |
| 6 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 162 | 5.4% |
| 8 | 154 | 5.1% |
| Other values (22) | 589 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3003 |
| Value | Count | Frequency (%) |
| (unknown) | 3007 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 363 | |
| 1 | 341 | |
| 2 | 302 | |
| 7 | 249 | |
| 4 | 228 | 7.6% |
| 6 | 222 | 7.4% |
| 5 | 195 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 167 | 5.6% |
| 8 | 144 | 4.8% |
| Other values (22) | 601 |
| Value | Count | Frequency (%) |
| 3 | 393 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 250 | |
| 4 | 228 | 7.6% |
| 0 | 205 | 6.8% |
| 6 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 162 | 5.4% |
| 8 | 154 | 5.1% |
| Other values (22) | 589 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3003 |
| Value | Count | Frequency (%) |
| (unknown) | 3007 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 363 | |
| 1 | 341 | |
| 2 | 302 | |
| 7 | 249 | |
| 4 | 228 | 7.6% |
| 6 | 222 | 7.4% |
| 5 | 195 | 6.5% |
| 0 | 191 | 6.4% |
| 9 | 167 | 5.6% |
| 8 | 144 | 4.8% |
| Other values (22) | 601 |
| Value | Count | Frequency (%) |
| 3 | 393 | |
| 1 | 337 | |
| 2 | 300 | |
| 7 | 250 | |
| 4 | 228 | 7.6% |
| 0 | 205 | 6.8% |
| 6 | 195 | 6.5% |
| 5 | 194 | 6.5% |
| 9 | 162 | 5.4% |
| 8 | 154 | 5.1% |
| Other values (22) | 589 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 176 | 185 |
| Distinct (%) | 39.5% | 41.5% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 33.529167 | 33.342104 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 8 | 10 |
| Zeros (%) | 1.8% | 2.2% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.0719 |
| Q1 | 8.05 | 7.925 |
| median | 15.2458 | 14.47915 |
| Q3 | 30.5 | 30.5 |
| 95-th percentile | 118.31875 | 130.2375 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 22.45 | 22.575 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 56.203453 | 53.993906 |
| Coefficient of variation (CV) | 1.6762556 | 1.6193911 |
| Kurtosis | 36.90768 | 31.015335 |
| Mean | 33.529167 | 33.342104 |
| Median Absolute Deviation (MAD) | 7.7229 | 7.22915 |
| Skewness | 5.2957188 | 4.7355989 |
| Sum | 14954.008 | 14870.579 |
| Variance | 3158.8282 | 2915.3419 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 29 | 6.5% |
| 8.05 | 22 | 4.9% |
| 26 | 19 | 4.3% |
| 7.8958 | 16 | 3.6% |
| 7.75 | 12 | 2.7% |
| 10.5 | 12 | 2.7% |
| 7.775 | 9 | 2.0% |
| 7.2292 | 9 | 2.0% |
| 7.925 | 9 | 2.0% |
| 26.55 | 9 | 2.0% |
| Other values (166) | 300 |
| Value | Count | Frequency (%) |
| 13 | 26 | 5.8% |
| 7.8958 | 19 | 4.3% |
| 8.05 | 18 | 4.0% |
| 26 | 15 | 3.4% |
| 7.75 | 15 | 3.4% |
| 0 | 10 | 2.2% |
| 10.5 | 9 | 2.0% |
| 8.6625 | 8 | 1.8% |
| 26.55 | 8 | 1.8% |
| 7.925 | 8 | 1.8% |
| Other values (175) | 310 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 4.0125 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 85 | 89 |
| Distinct (%) | 80.2% | 88.1% |
| Missing | 340 | 345 |
| Missing (%) | 76.2% | 77.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.6886792 | 3.6336634 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 68 | 77 ? |
| Unique (%) | 64.2% | 76.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | E33 | C101 |
| 2nd row | C86 | D26 |
| 3rd row | B96 B98 | E8 |
| 4th row | E101 | B22 |
| 5th row | F33 | B39 |
| Value | Count | Frequency (%) |
| b96 | 4 | 3.1% |
| b98 | 4 | 3.1% |
| e101 | 3 | 2.4% |
| c23 | 3 | 2.4% |
| c25 | 3 | 2.4% |
| c27 | 3 | 2.4% |
| f | 3 | 2.4% |
| f33 | 2 | 1.6% |
| e8 | 2 | 1.6% |
| b18 | 2 | 1.6% |
| Other values (85) | 98 |
| Value | Count | Frequency (%) |
| e67 | 2 | 1.7% |
| d20 | 2 | 1.7% |
| c23 | 2 | 1.7% |
| c25 | 2 | 1.7% |
| c27 | 2 | 1.7% |
| e101 | 2 | 1.7% |
| f4 | 2 | 1.7% |
| d35 | 2 | 1.7% |
| e44 | 2 | 1.7% |
| c2 | 2 | 1.7% |
| Other values (89) | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 38 | 9.7% |
| B | 37 | 9.5% |
| C | 35 | 9.0% |
| 2 | 35 | 9.0% |
| 3 | 29 | 7.4% |
| 5 | 28 | 7.2% |
| 6 | 24 | 6.1% |
| 8 | 21 | 5.4% |
| 21 | 5.4% | |
| E | 19 | 4.9% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| C | 37 | 10.1% |
| 1 | 37 | 10.1% |
| 3 | 35 | 9.5% |
| 2 | 34 | 9.3% |
| B | 29 | 7.9% |
| 6 | 23 | 6.3% |
| 7 | 19 | 5.2% |
| 0 | 19 | 5.2% |
| E | 18 | 4.9% |
| 4 | 18 | 4.9% |
| Other values (9) | 98 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 391 |
| Value | Count | Frequency (%) |
| (unknown) | 367 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 38 | 9.7% |
| B | 37 | 9.5% |
| C | 35 | 9.0% |
| 2 | 35 | 9.0% |
| 3 | 29 | 7.4% |
| 5 | 28 | 7.2% |
| 6 | 24 | 6.1% |
| 8 | 21 | 5.4% |
| 21 | 5.4% | |
| E | 19 | 4.9% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| C | 37 | 10.1% |
| 1 | 37 | 10.1% |
| 3 | 35 | 9.5% |
| 2 | 34 | 9.3% |
| B | 29 | 7.9% |
| 6 | 23 | 6.3% |
| 7 | 19 | 5.2% |
| 0 | 19 | 5.2% |
| E | 18 | 4.9% |
| 4 | 18 | 4.9% |
| Other values (9) | 98 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 391 |
| Value | Count | Frequency (%) |
| (unknown) | 367 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 38 | 9.7% |
| B | 37 | 9.5% |
| C | 35 | 9.0% |
| 2 | 35 | 9.0% |
| 3 | 29 | 7.4% |
| 5 | 28 | 7.2% |
| 6 | 24 | 6.1% |
| 8 | 21 | 5.4% |
| 21 | 5.4% | |
| E | 19 | 4.9% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| C | 37 | 10.1% |
| 1 | 37 | 10.1% |
| 3 | 35 | 9.5% |
| 2 | 34 | 9.3% |
| B | 29 | 7.9% |
| 6 | 23 | 6.3% |
| 7 | 19 | 5.2% |
| 0 | 19 | 5.2% |
| E | 18 | 4.9% |
| 4 | 18 | 4.9% |
| Other values (9) | 98 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 391 |
| Value | Count | Frequency (%) |
| (unknown) | 367 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 38 | 9.7% |
| B | 37 | 9.5% |
| C | 35 | 9.0% |
| 2 | 35 | 9.0% |
| 3 | 29 | 7.4% |
| 5 | 28 | 7.2% |
| 6 | 24 | 6.1% |
| 8 | 21 | 5.4% |
| 21 | 5.4% | |
| E | 19 | 4.9% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| C | 37 | 10.1% |
| 1 | 37 | 10.1% |
| 3 | 35 | 9.5% |
| 2 | 34 | 9.3% |
| B | 29 | 7.9% |
| 6 | 23 | 6.3% |
| 7 | 19 | 5.2% |
| 0 | 19 | 5.2% |
| E | 18 | 4.9% |
| 4 | 18 | 4.9% |
| Other values (9) | 98 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 2 | 1 |
| Missing (%) | 0.4% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q | 32 |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 79 | 17.7% |
| Q | 32 | 7.2% |
| (Missing) | 2 | 0.4% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 76 | 17.0% |
| Q | 39 | 8.7% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 333 | |
| c | 79 | 17.8% |
| q | 32 | 7.2% |
| Value | Count | Frequency (%) |
| s | 330 | |
| c | 76 | 17.1% |
| q | 39 | 8.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 79 | 17.8% |
| Q | 32 | 7.2% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 76 | 17.1% |
| Q | 39 | 8.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 79 | 17.8% |
| Q | 32 | 7.2% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 76 | 17.1% |
| Q | 39 | 8.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 79 | 17.8% |
| Q | 32 | 7.2% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 76 | 17.1% |
| Q | 39 | 8.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 79 | 17.8% |
| Q | 32 | 7.2% |
| Value | Count | Frequency (%) |
| S | 330 | |
| C | 76 | 17.1% |
| Q | 39 | 8.8% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.083 | 0.156 | -0.276 | 0.001 | 0.261 | 0.000 | -0.169 | 0.229 |
| Embarked | 0.083 | 1.000 | 0.210 | 0.031 | 0.000 | 0.242 | 0.143 | 0.000 | 0.200 |
| Fare | 0.156 | 0.210 | 1.000 | 0.390 | -0.028 | 0.449 | 0.231 | 0.426 | 0.298 |
| Parch | -0.276 | 0.031 | 0.390 | 1.000 | 0.038 | 0.000 | 0.248 | 0.439 | 0.200 |
| PassengerId | 0.001 | 0.000 | -0.028 | 0.038 | 1.000 | 0.114 | 0.049 | -0.090 | 0.150 |
| Pclass | 0.261 | 0.242 | 0.449 | 0.000 | 0.114 | 1.000 | 0.170 | 0.108 | 0.345 |
| Sex | 0.000 | 0.143 | 0.231 | 0.248 | 0.049 | 0.170 | 1.000 | 0.185 | 0.531 |
| SibSp | -0.169 | 0.000 | 0.426 | 0.439 | -0.090 | 0.108 | 0.185 | 1.000 | 0.141 |
| Survived | 0.229 | 0.200 | 0.298 | 0.200 | 0.150 | 0.345 | 0.531 | 0.141 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.069 | 0.104 | -0.284 | 0.091 | 0.214 | 0.019 | -0.167 | 0.091 |
| Embarked | 0.069 | 1.000 | 0.184 | 0.000 | 0.000 | 0.259 | 0.024 | 0.014 | 0.095 |
| Fare | 0.104 | 0.184 | 1.000 | 0.391 | 0.020 | 0.478 | 0.191 | 0.437 | 0.273 |
| Parch | -0.284 | 0.000 | 0.391 | 1.000 | 0.040 | 0.000 | 0.293 | 0.457 | 0.172 |
| PassengerId | 0.091 | 0.000 | 0.020 | 0.040 | 1.000 | 0.032 | 0.000 | -0.062 | 0.045 |
| Pclass | 0.214 | 0.259 | 0.478 | 0.000 | 0.032 | 1.000 | 0.150 | 0.104 | 0.362 |
| Sex | 0.019 | 0.024 | 0.191 | 0.293 | 0.000 | 0.150 | 1.000 | 0.255 | 0.569 |
| SibSp | -0.167 | 0.014 | 0.437 | 0.457 | -0.062 | 0.104 | 0.255 | 1.000 | 0.112 |
| Survived | 0.091 | 0.095 | 0.273 | 0.172 | 0.045 | 0.362 | 0.569 | 0.112 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 356 | 357 | 1 | 1 | Bowerman, Miss. Elsie Edith | female | 22.0 | 0 | 1 | 113505 | 55.0000 | E33 | S |
| 754 | 755 | 1 | 2 | Herman, Mrs. Samuel (Jane Laver) | female | 48.0 | 1 | 2 | 220845 | 65.0000 | NaN | S |
| 158 | 159 | 0 | 3 | Smiljanic, Mr. Mile | male | NaN | 0 | 0 | 315037 | 8.6625 | NaN | S |
| 405 | 406 | 0 | 2 | Gale, Mr. Shadrach | male | 34.0 | 1 | 0 | 28664 | 21.0000 | NaN | S |
| 117 | 118 | 0 | 2 | Turpin, Mr. William John Robert | male | 29.0 | 1 | 0 | 11668 | 21.0000 | NaN | S |
| 544 | 545 | 0 | 1 | Douglas, Mr. Walter Donald | male | 50.0 | 1 | 0 | PC 17761 | 106.4250 | C86 | C |
| 763 | 764 | 1 | 1 | Carter, Mrs. William Ernest (Lucile Polk) | female | 36.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 532 | 533 | 0 | 3 | Elias, Mr. Joseph Jr | male | 17.0 | 1 | 1 | 2690 | 7.2292 | NaN | C |
| 246 | 247 | 0 | 3 | Lindahl, Miss. Agda Thorilda Viktoria | female | 25.0 | 0 | 0 | 347071 | 7.7750 | NaN | S |
| 784 | 785 | 0 | 3 | Ali, Mr. William | male | 25.0 | 0 | 0 | SOTON/O.Q. 3101312 | 7.0500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 571 | 572 | 1 | 1 | Appleton, Mrs. Edward Dale (Charlotte Lamson) | female | 53.0 | 2 | 0 | 11769 | 51.4792 | C101 | S |
| 102 | 103 | 0 | 1 | White, Mr. Richard Frasar | male | 21.0 | 0 | 1 | 35281 | 77.2875 | D26 | S |
| 706 | 707 | 1 | 2 | Kelly, Mrs. Florence "Fannie" | female | 45.0 | 0 | 0 | 223596 | 13.5000 | NaN | S |
| 204 | 205 | 1 | 3 | Cohen, Mr. Gurshon "Gus" | male | 18.0 | 0 | 0 | A/5 3540 | 8.0500 | NaN | S |
| 133 | 134 | 1 | 2 | Weisz, Mrs. Leopold (Mathilde Francoise Pede) | female | 29.0 | 1 | 0 | 228414 | 26.0000 | NaN | S |
| 197 | 198 | 0 | 3 | Olsen, Mr. Karl Siegwart Andreas | male | 42.0 | 0 | 1 | 4579 | 8.4042 | NaN | S |
| 103 | 104 | 0 | 3 | Johansson, Mr. Gustaf Joel | male | 33.0 | 0 | 0 | 7540 | 8.6542 | NaN | S |
| 675 | 676 | 0 | 3 | Edvardsson, Mr. Gustaf Hjalmar | male | 18.0 | 0 | 0 | 349912 | 7.7750 | NaN | S |
| 652 | 653 | 0 | 3 | Kalvik, Mr. Johannes Halvorsen | male | 21.0 | 0 | 0 | 8475 | 8.4333 | NaN | S |
| 431 | 432 | 1 | 3 | Thorneycroft, Mrs. Percival (Florence Kate White) | female | NaN | 1 | 0 | 376564 | 16.1000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 284 | 285 | 0 | 1 | Smith, Mr. Richard William | male | NaN | 0 | 0 | 113056 | 26.0000 | A19 | S |
| 218 | 219 | 1 | 1 | Bazzani, Miss. Albina | female | 32.0 | 0 | 0 | 11813 | 76.2917 | D15 | C |
| 290 | 291 | 1 | 1 | Barber, Miss. Ellen "Nellie" | female | 26.0 | 0 | 0 | 19877 | 78.8500 | NaN | S |
| 403 | 404 | 0 | 3 | Hakkarainen, Mr. Pekka Pietari | male | 28.0 | 1 | 0 | STON/O2. 3101279 | 15.8500 | NaN | S |
| 623 | 624 | 0 | 3 | Hansen, Mr. Henry Damsgaard | male | 21.0 | 0 | 0 | 350029 | 7.8542 | NaN | S |
| 195 | 196 | 1 | 1 | Lurette, Miss. Elise | female | 58.0 | 0 | 0 | PC 17569 | 146.5208 | B80 | C |
| 712 | 713 | 1 | 1 | Taylor, Mr. Elmer Zebley | male | 48.0 | 1 | 0 | 19996 | 52.0000 | C126 | S |
| 664 | 665 | 1 | 3 | Lindqvist, Mr. Eino William | male | 20.0 | 1 | 0 | STON/O 2. 3101285 | 7.9250 | NaN | S |
| 423 | 424 | 0 | 3 | Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren) | female | 28.0 | 1 | 1 | 347080 | 14.4000 | NaN | S |
| 349 | 350 | 0 | 3 | Dimic, Mr. Jovan | male | 42.0 | 0 | 0 | 315088 | 8.6625 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 833 | 834 | 0 | 3 | Augustsson, Mr. Albert | male | 23.0 | 0 | 0 | 347468 | 7.8542 | NaN | S |
| 732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN | 0 | 0 | 239855 | 0.0000 | NaN | S |
| 432 | 433 | 1 | 2 | Louch, Mrs. Charles Alexander (Alice Adelaide Slow) | female | 42.0 | 1 | 0 | SC/AH 3085 | 26.0000 | NaN | S |
| 323 | 324 | 1 | 2 | Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh) | female | 22.0 | 1 | 1 | 248738 | 29.0000 | NaN | S |
| 216 | 217 | 1 | 3 | Honkanen, Miss. Eliina | female | 27.0 | 0 | 0 | STON/O2. 3101283 | 7.9250 | NaN | S |
| 772 | 773 | 0 | 2 | Mack, Mrs. (Mary) | female | 57.0 | 0 | 0 | S.O./P.P. 3 | 10.5000 | E77 | S |
| 290 | 291 | 1 | 1 | Barber, Miss. Ellen "Nellie" | female | 26.0 | 0 | 0 | 19877 | 78.8500 | NaN | S |
| 828 | 829 | 1 | 3 | McCormack, Mr. Thomas Joseph | male | NaN | 0 | 0 | 367228 | 7.7500 | NaN | Q |
| 702 | 703 | 0 | 3 | Barbara, Miss. Saiide | female | 18.0 | 0 | 1 | 2691 | 14.4542 | NaN | C |
| 89 | 90 | 0 | 3 | Celotti, Mr. Francesco | male | 24.0 | 0 | 0 | 343275 | 8.0500 | NaN | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||